Morningstar Build: Additional Details
==============

I. INTRODUCTION
--------------

This file provides additional documentation for the Morningstar build process, which imports the mutual fund and ETF holdings data by Morningstar. The output of this build is a series of clean data files that are of manageable sizes and are then used by the follow-on code.

II. EXECUTING THE BUILD
--------------

The bash scripts `Morningstar_Build_PreBloomberg.sh` and `Morningstar_Build_PostBloomberg.sh` are the main executable files.  Launching these files will run the full Morningstar build, start-to-finish. The `Morningstar_Build_PreBloomberg.sh` file executes the step before the manual break for the Bloomberg terminal queries, while the `Morningstar_Build_PostBloomberg.sh` file executes the steps after the break. Each of the steps of the build are outlined in the bash files, with short descriptions of their functions. Descriptions can also be found in Section III of this file.

These files should be called as: `sh Morningstar_Build_PreBloomberg.sh <WHOAMI> all` and `sh Morningstar_Build_PostBloomberg.sh <WHOAMI> all`.

III. BUILD DETAILS
--------------

1. `Read XML`: This step imports raw Morningstar data which is in XML format and converts it into DTA format. The output is a collection of DTA files that contain the same information as what is in the raw XML files delivered by Morningstar.

2. `Morningstar Mapping Build`: This step creates a crosswalk that connects the different fund indentification codes in the data caused by format changes. The output is 
a fund-level crosswalk which will be used in generating consistent holdings data across different versions, plus linked fund metadata.

3. `ER Data`: This step creates exchange rate data using IFS database in IMF data. The output is bilateral exchange rate data which will be used to as supplemental information for Morningstar data.

4. `PortfolioSummary Build`: This step creates a clean version of the summary mutual fund and ETF holdings data delivered by Morningstar. The output is a series of clean files related to summary holdings which will be used in the later build steps.

5. `HoldingDetail Build (Raw Stage)`: This step processes raw monthly files starting from April 2020 when the new format is implemented. 

6. `HoldingDetail Build (Clean Stage)`: This step generates a clean version of the detailed mutual fund and ETF holdings data delivered by Morningstar. In this stage, 
Holding Detail position-level data is merged with several data sources including mapping file and exchange rate data. The output is a series of clean files regarding detailed holdings which will be used in the later build steps.

7. `Refine Parse Externalid`: This step cleans the externalid field in the Morningstar holdings data to create a new cleaned field called `externalid_mns`. It is used to identify securities when a cusip is missing.

8. `Externalid (Pre-Bloomberg Stage)`: This step processes the externalid part before the manual break. It creates a externalid list, and then obtains OpenFIGI data via API using the list. This step involves R script which needs to process through a login node. Please note that one needs to stay connected to the server until the whole R script part completely finishes.  

9. `Externalid (Manual Break: Bloomberg)`: This step involves manual break for the Bloomberg terminal after running a R script to obtain OpenFIGI data via API using the externalid list created before.

10. `Externalid (Post-Bloomberg Stage)`: This step uses OpenFIGI as well as the corresponding raw data pulled from the Bloomberg terminal to match to data in the CUSIP and ISIN master files produced by the present build. Then it creates a link between `externalid_mns` and cusip/isin master files. It also incorporates all security-level details for each external id.

11. `Refine Cusip Fill Isin`: This step performs a series of data cleaning steps that improve the quality of security metadata (i.e., cusip and isin) by merging in information from the CGS security master files and the OpenFIGI/Bloomberg data pull.

12. `Externalid_make`: This step generates an internal flatfile which has all security-level details for each externalid in the Morningstar holdings data. This file will be used later to improve security characteristics of the Morningstar data.

13. `Refine Extid Merge`: This step improves the quality of security information included in the Morningstar data by merging information from the internally-generated externalid master file into the holdings data.

14. `Internal Currency`: This step constructs a data with a modal currency assignments for each fund in the Morningstar data.

15. `Refine Cusip Merge` This step merges security-level data from the CUSIP Global Services (CGS) master files into the holding details data.

16. `Internal Class`: This step finds the modal typecode assigned to each fund in the Morningstar data. 

17. `Manual Corrections`: This step cleans several extraordinarily large positions in the holding details data. 

18. `Create Final Files`: This step generates the final holding detail files prior to the fund unwind step.

19. `Unwind MF Prepare`: This step reads in monthly holding detail datasets (pre-unwind) and creates temporary files for later steps which unwind fund-in-fund position.

20. `Unwind MF Positions Unravel`: This step handles the actual unraveling of positions of funds investing in other funds. Positions are attributed to the ultimate holding fund, and the positions of the investing fund are scaled back accordingly.

21. `Unwind MF Positions Consolidate`: This step consolidates temporary files generate from Unravel step. 

22. `Unwind MF Positions Re-generate HD`: This step re-generates holding detail files that reflects the unraveling and rescaling of positions.

23. `Unwind MF Positions Aggregate`: This steps aggregates temporary files generated from earlier steps that have half-year frequency; and generate the yearly holding detail files.

24. `panelify/*`: These steps reshape the data into a constant panel version, and they construct flows measures at the fund-security level.
